《计算机应用》唯一官方网站

• •    下一篇

综合成分句法分析的技术名称识别

朱俊杰1,余丽2,李圣文1,周长征3   

  1. 1. 中国地质大学(武汉) 计算机学院,武汉,430078;
    2. 北京理工大学 中国工程科技前沿交叉战略研究中心,北京 100081;
    3. 十堰巨能电力设计有限公司,十堰 湖北,442012


  • 收稿日期:2023-05-05 修回日期:2023-10-12 接受日期:2023-10-12 发布日期:2023-12-04 出版日期:2023-12-04
  • 通讯作者: 李圣文
  • 基金资助:
    国家自然科学基金

Technology term recognition with comprehensive constituency parsing

  • Received:2023-05-05 Revised:2023-10-12 Accepted:2023-10-12 Online:2023-12-04 Published:2023-12-04

摘要: 技术名称是用于准确交流信息的术语,自动识别技术名称可以帮助专家和大众发现、认知、应用新技术,具有重要的价值;基于无监督的方法是识别技术名称的主流方法,但存在规则复杂、适应性差等问题。为了提升从文本中识别技术名称的能力,提出一种综合成分句法的技术名称识别方法。首先,通过成分句法分析构造句法结构树;其次,从自上而下和自下而上这两个角度抽取候选技术名称;最后,融合统计频次和语义信息,以选取最优技术名称。此外,构建一个技术术语数据集以验证所提方法的有效性。在该数据集上的实验结果表明,相较于基于依存关系的传统方法,所提方法的F1值提高了7.62个百分点;同时,在3D打印领域进行了案例分析,发现所提方法识别的技术名称与该名称对应领域的发展契合,可用于回溯技术的发展历程和描绘技术的演化路径,为理解、发现、探索领域未来技术提供参考。

关键词: 技术名称识别, 成分句法分析, 无监督方法, 成分句法树, 术语抽取

Abstract: Technology terms are used to communicate accurate information. Automatically identifying technology terms from text can help experts and the public to discover, recognize, and apply new technologies, which is great of value. As one of mainstream approaches, unsupervised technology terms recognition methods still have some limitations, such as complex rules and poor adaptability. To enhance the ability to recognize technology terms from text, an unsupervised technical term recognition method was proposed. Firstly, a syntactic structure tree was constructed through constituency parsing. Then, the candidate technology terms were extracted from both top-down and bottom-up perspectives. Finally, the statistical frequency and semantic information were combined to determine the most appropriate technology terms. Besides, a technology term dataset was constructed to validate the effectiveness of the proposed method. Experimental results on the proposed dataset show that has the F1 score improved by 7.62 percentage points compared to traditional dependency-based methods. MeanWhile, the analysis results conducted on case study in the field of 3D Printing show that the identified technology terms are in line with the development of the field, which can be used to trace the development process of technology and depict the evolution path of technology, so as to provide references for understanding, discovering, and exploring future technologies.

Key words: technology term recognition, constituency parsing, unsupervised method, constituent parse tree, term extraction

中图分类号: